Evaluation Corpus for Restricted-Domain Question- Answering Systems for the Holy Quran
نویسندگان
چکیده
This paper presents the compilation of a corpus of question-answer pairs for the holy Quran. The corpus has been manually collected from a wide range of sources, and designed to represent the Quran Arabic-English Question and Answer Corpus (QAEQ&AC). QAEQ&AC is a written, bilingual corpus, which comprises Arabic and English text. First, question-answer pairs have been collected from several trusted expert sources. Then the data were merged and cleaned using Microsoft Excel. After that data were converted to the format that suitable for mining tools, where we have created a comma-separated value (CSV) file format. The corpus obtained consists of more than 1500 question-answer pairs which is nearly 50.000 words, divided over Arabic and English languages. It includes different question types such as what, when, why, etc., and different answer length. We anticipate that the current and subsequent versions of our corpus will be a valuable evaluation resource for computational linguists investigating Quran question and answer; it might be used as a gold standard in researches, that dealing with natural language processing, information retrieval, artificial intelligence. The corpus can be subjected to an annotation to derive linguistic information such as morphological, syntactic, semantic, and lexical information.
منابع مشابه
Al-Bayan: An Arabic Question Answering System for the Holy Quran
Recently, Question Answering (QA) has been one of the main focus of natural language processing research. However, Arabic Question Answering is still not in the mainstream. The challenges of the Arabic language and the lack of resources have made it difficult to provide Arabic QA systems with high accuracy. While low accuracies may be accepted for general purpose systems, it is critical in some...
متن کاملA Question Answering System on Holy Quran Translation Based on Question Expansion Technique and Neural Network Classification
Corresponding Author: Suhaib Kh. Hamed Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, University Kebangsaan Malaysia, Bangi, 43600, Selangor, Malaysia Tel:0060-1139044355 Email: [email protected] Abstract: In spite of great efforts that have been made to present systems that support the user’s need of the answers from the Holy Qu...
متن کاملارایه یک پیکره پرسش و پاسخ مذهبی در زبان فارسی
Question answering system is a field in natural language processing and information retrieval noticed by researchers in these decades. Due to a growing interest in this field of research, the need to have appropriate data sources is perceived. Most researches about developing question answering corpus area have been done in English so far, but in other languages as Persian, the lack of these co...
متن کاملOntology-based Query Expansion for Arabic Text Retrieval
The semantic resources are important parts in the Information Retrieval (IR) such as search engines, Question Answering (QA), etc., these resources should be available, readable and understandable. In semantic web, the ontology plays a central role for the information retrieval, which use to retrieves more relevant information from unstructured information. This paper presents a semantic-based ...
متن کاملAn Arabic Natural Language Interface System for a Database of the Holy Quran
In the time being, the need for searching in the words, objects, subjects, and statistics of words and parts of the Holy Quran has grown rapidly concurrently with the grow of number of Moslems and the huge usage of smart mobiles, tablets and lab tops. Because, databases are used almost in all activities of our life, some DBs have been built to store information about words and surah of Quran. T...
متن کامل